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Abstract 


The common product is a multiset-based signed binary operator (in the mathematical sense of taking two argu- 


ments) that provides a quantification of the similarity between two scalars, vectors, functions, or virtually any other 


mathematical structure. This operation has been shown to impose a more strict quantification of similarity than more 


commonly employed alternatives including the cosine similarity and cross-correlations. In this work we study a distance 


derived from the common product, as well as a respective similarity index that have a limited support, therefore reduc- 





ing or eliminating the influence of outliers. Multiscale, multi-order respective versions of these distance and similarity 


index are then proposed. To illustrate the potential of the reported concepts and methods, the multiscale similarity 





index is then applied it to obtain a respective least mean distance approximation method. The obtained method, which 


is shown to provide several interesting properties specific to certain applications, is then illustrated and discussed. 


“Distant stars, seemingly so similar.’ 


LdaFC 


1 Introduction 


Distance and similarity share many features in science and 
technology, playing an important role in many concepts, 
methods, and applications. While the Euclidean distance 
very probably corresponds to the most frequently adopted 
distance, the cosine similarity, the inner product and the 
Jaccard index tend to be frequently employed. The latter 
index is typically applied to binary or categorical data. 
More recently |1, 2, 3], through the consderation of mul- 
tiset concepts (e.g. [4, 5, 6, 7, 8, 9]), a generalization of 
the Jaccard index has been obtained that can be applied 
to real, possibly negative, values. Given that this index 
has been shown |1] not to take into account the relative 
interiority of the two compared objects, the index called 
coincidence has been proposed |1, 2, 3] that integrates in- 
formation from both the real-valued Jaccard and the in- 
teriority indices. These two indices, both of which based 
on the common product, have been shown to allow en- 
hanced performance in several tasks, including template 
matching |10], representation of data as complex networks 
and highlight of modular structure [11], as well as pattern 


recognition |12] 

The present work focuses on deriving a distance index 
from the correlates of the common product, real-valued 
Jaccard, and coincidence indices, which is called coinci- 
dence distance. This index quantifies, up to a scale pa- 
rameter, the distance from a reference value ¢ only along 
an interval [č — w,é+w]. In this manner, points that are 
further away from «x are all considered to have the same 
distance. This feature provides an interesting approach to 
limit the influence of outliers on the distance estimation. 
In other words, the distance quantification operates only 
locally at a scale defined by the parameter w. 

A respective similarity index capable of quantifying the 
local similarity is then obtained that shares the same 
ability to consider the similarity only within a window 





lx — w,x + w| around the reference value x to be com- 
pared. 





A multiscale curve fitting method is then derived from 
this similarity index that has the following interesting 
properties: (i) it operates locally, ignoring points that 
are further away than w, which are understood as out- 
liers to be left out; (ii) the definition of the window to 
be considered can be readily controlled by the scaling pa- 
rameter w; (iii) the sharpness of the active region can be 
conveniently controlled by a parameter D; (iv) provided 
the original points are not too noisy, the fitting solution 
can be effectively obtained by using the simple gradient 


descent approach, otherwise local peaks may occur and 
simulated annealing approaches, or the Hough transform 
(e.g. [13, 14]). can be considered. 


2 The Common Product Distance 


The common product between two scalars x and y has 
been defined |1, 2, 3, 15] as: 





CY = Sey MIN S;,2,S yy} (1) 


with cy =ynnz. 

This product has been derived from the extension of 
multiset theory to real values |1, 2, 3, 15] and shown to 
be directly related to the Kronecker delta function [3], 
though providing a more practical quantification of the 
similarity between two scalars. 

A better understanding of this product can be obtained 
by fixing one of its arguments to a constant value. For 
instance, in case the argument y in Equation 6 is fixed as 
y = č #0, the binary operation becomes a function of zx. 

Figure 1 illustrates the common product for x = 2, 
Le 2h yS yl 2. 





Figure 1: The common product binary operator with one of its ar- 
guments fixed, i.e. x = 2, therefore implementing the single variable 
function 2M y = y N 2 shown in this figure. 


Observe that —e < xMc < č. A normalized version of 
the common product can be obtained as: 


ay = Sey min 1836, Syy} (2) 
C 

with —1 < xy < 1 and € £ 0. 

This version of the common product is shown in Fig- 
ure 2(a). 

It is also possible to take the absolute value of the above 
similarity index, yielding: 
cl ly = 


min {s2C, Syy} (3) 
C 


and now we have with 0 < ally < 1. This function is 
illustrated in Figure 2(b). 






Figure 2: Successive transformations of the common product with 
one argument kept constant at y = 2 in Fig. 1 (a) so as to have unit 
height (a), absolute value (b), displaced to x = 2 (c), width 2 (d). 


A shifted version of the previous function can now be 
derived as: 
a mini S20, Sy- Yy = Cc 
A(e,y) = | BEES Suey 9) (4) 
with 0 < @ly < 1. We also have that A(x, 2) = Â (z, x). 
This function is shown in Figure 2(c). 


In order to allow the width of the above distance to 
be controlled by a parameters w, implying width 2w, we 





rewrite the previous equation as; 


Meee min { szē, siu-oe/ully —2)é/w)} (5 


This distance is illustrated in Figure 2(d). 
The function 6(2, €) can be brought back to its original 
general binary form as: 
min 4 Sz, $(y—z)z/w((y — £)£/w) 
with 0 < A(z,y,w) <1, y4#0 and z £0. 
Therefore, we have that the above distance has been 





obtained all the way from the common product through 
successive transformations. For this reason, in this work 
this distance is called the common product distance. 

The obtained distance has some interesting specific 
properties that makes it an interesting option for certain 
applications. First, it is particularly simple, involving just 
a minimum and a division operation, plus the very simple 
sign functions. Of particular importance, though, is the 
saturation of the distance for values of y that are larger 
than w. The latter feature implies all the more distant 
points to be considered all with the maximum distance of 





1, therefore reducing the influence of outliers. As such, 
the obtained distance can provide an interesting alterna- 
tive in situations in which the effect of outliers need to 
be reduced. Observe also that the obtained distance is 
multiscale, in the sense that its width is controlled by the 
parameter w. 

Given the interesting features obtained for A(z, y, w), 
a respectively derived similarity index can be considered. 
This can be immediately obtained by making: 


s(z,y,w) =1—A(z,y,w) (7) 


As with the common product distance, this similar- 
ity index is also multiscale and can cope with outliers 
if needed. Indeed, the similarity with outliers, i.e. points 
that are further away than w becomes zero, consequently 
not being taken into account. This feature allows the in- 
troduction of a new parameter D controlling the order (or 
degree) of the similarity index, i.e.: 


s(x,y,w, D) = [1 — A(z, y, w)” (8) 


with 0 < s(x, y,w, D) < 1. 
Though the parameter D can take any real value, here 
we limit our attention to positive integer values of D. 


Figure 3 depicts the common product similarity for c, 
w = 4 and d = 1,2,...,8. 





Observe the effect of the parameter D on the sharpness 
of the similarity peak . Therefore, more strict similarity 
measurements are obtained by using larger values of D. 


3 Application to Line Fitting 


In this section we provide an illustration of the potential 





of the developed common product similarity with respect 
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Figure 3: The multiscale common product similarity for € = 5, 
w = 4 and D = 1,2,...,8. 


to the important task of straight line fitting. 
An affine, or straight function, is henceforth understood 
to correspond to: 


y=mzr+c (9) 


where the parameter m is commonly called the line 
slope and the parameter c its respective intersect. 

The polar line equation can also be of interest, being 
expressed as: 


p = x cos(0) + ysin(@) (10) 


Fitting a line to a set of discrete points typically starts 
with a table containing the respective x and y coordinates 
of the latter. The traditional approach to line fitting has 
been through the least square methods (e.g. [16]), though 
the respective Lı norm (absolute value) version — least 
absolute deviations — LAV (e.g. [17]) — has also been 
applied (e.g. [16]). However, neither of these can cope 
with outliers or have immediate parameters controlling 
the function profile. 

In the present work, we apply the common product sim- 
ilarity developed from the multiset-based common prod- 
uct in Section 2 

The basic idea is to apply the common product simi- 
larity to gauge the similarity between each candidate so- 
lution 

Ue = Tig Ce (11) 
and the yo coordinate of each of the N original points 
(Lo, Yo), Le. S(Ye, Yo, W, D). 

The optimization criterion therefore consists of maxi- 
mizing the sum of the respective common product simi- 
larities, i.e.: 


Find me and ce that maximizes: 
N 


> 8(Ye,is Yo,i, w, D) fort = L 2s N (12) 
i=1 


The especially important issue here regards the method 
to be applied for implementing the above optimization. 





There are several possibilities, but gradient ascent has 





been experimentally found to be particularly effective 
given the typically smooth energy landscapes that are ob- 
tained for relatively large values of w. 

Figure 4 illustrates a set of points to be fitted by a 
straight line. 
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Figure 4: A set of points to be fitted by a straight line y = mx + c. 
Also shown in salmon is the result of the fit by using w = 0.5 and 
D = 2. These points were obtained by adding symmetric uniform 
noise to a straigth line defined by mo = 0.3 and co = 1. 


The distribution of the similarity values along the pa- 
rameters space |m, c] can be effectively appreciated when 
displayed as an image. Figure 5 illustrates the accumu- 
lated similarity along the parameter space obtained by 
considering w = 1 and D = 2 in the set of points in Fig. 4. 
For simplicity’s sake, only the portion of the parameter 
space respective to m € [—1,1] is henceforth shown. 


Despite the relatively high level of noise in the original 
points, a remarkably smooth surface has been obtained. 
The surface of accumulated similarities present a peak 
that coincides accurately with the original line parameters 
Mo, = 0.3 and co = 1 Such a smooth surface with such a 





well defined peak tends to substantially favor optimization 
methods such as gradient ascent. 

Figure 6 illustrates the parameter space obtained for 
w = 3 and a substantially higher D = 10. ‘The ob- 
tained similarity surface is still quite smooth, though with 
a sharper peak implied by D = 10. This type of surface 
would still be suitable for gradient ascent approaches. 
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Figure 5: The parameters space |m, c] obtained for the set of points 
in Fig. 4 with respect to w = 1 and D = 2. A remarkably smooth 
surface has been obtained even considering the relatively high level 
of noise in the original points, which substantially favors optimiza- 
tion approaches such as gradient ascent. It is observed that the 
obtained surface contains much more levels than shown in this fig- 
ure, for the level sets have been imposed in order to give a better 
perspective of the geometry of the surface. The point in cyan indi- 
cates the position of the optimal, original parameters mo = 0.3 and 
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Figure 6: The parameters space |m, c] obtained for the set of points 
in Fig. 4 with respect to w = 3 and D = 10. A still remark- 
ably smooth surface has been obtained, though with a sharper peak 
than in the previous case. This example illustrates the effect of the 
parameter D in controlling how much strict the similarity quantifi- 
cation is being performed. 


To complete our examples, we present in Figure 7 the 
similarity surface on the parameter space obtained for 
w = 0.3 and D = 1, which implies a much narrower 
window than in the two above examples. Remarkably, 
the obtained similarity surface still resulted, at least for 
the considered set of points, intensely smooth, with an 
even sharper peak around the optimal original parame- 
ters shown in cyan. 
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Figure 7: The parameters space |m, c] obtained for the set of points 
in Fig. 4 with respect to w = 0.3 and D = 1. A still remarkably 
smooth surface has been obtained, though with an even sharper 
peak than in the previous cases. 


Figure 8 illustrates the application of the gradient as- 
cent for w = 1 and D = 2. The trajectory of the gradi- 
ent, shown in magenta, indicates a relatively direct con- 
vergence to the peak. As a consequence of the loss of 
information implied by the high level of noise added to 
the original points, the found peak is not perfectly identi- 
cal to the original noiseless line parameters mo = 0.3 and 
CG, = 1, 


Figure 9 depicts the resul obtained for w = 0.3 and 
D = 1. Again, the gradient ascent was relatively direct, 
reaching a peak value very similar to that obtained in the 
previous gradient ascent example. 


4 Concluding Remarks 


Developed by using multiset concepts, the common prod- 
uct has verified to allow enhanced performance in a large 
number of situations, mainly as a consequence of imple- 
menting a more strict quantification of similarity than 
the extensively used cosine similarity, inner product, and 





Figure 8: The trajectory of the gradient ascent obtained for w = 1 
and D = 2 respectively to the points in Fig. 4. In addition to 
the relatively direct trajectory towards the peak, the obtained val- 
ues, m = 0.347 and c = 0.985, are very close to the parameters 
of the original noiseless image from which the points were obtained 





through addition of elevated level of noise. 
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Figure 9: The trajectory of the gradient ascent obtained for w = 1 
and D = 2 respectively to the points in Fig. 4. In addition to the 
relatively direct trajectory towards the peak, the obtained values, 
m = 0.34 and c = 0.994, are very close to the parameters of the 
original noiseless line. 


cross-correlation. 

In the present work, we started with the common prod- 
uct and developed, through successive modifications, a 
respective distance, here called the common product dis- 


tance, which has several interesting features including 
multiscale operation (controlled by the parameter w), ad- 
justable sharpness (through parameter D), ability to re- 
duce the impact of outlier points, as well as conceptual 
and arithmetic relative simplicity. 

It was then shown that this distance could be brought 
back to the similarity perspective with some interesting 
features, such as operating completely in the local scale 
defined by the width 2w. ‘This feature is of particular 
interest in case the outliers are to be completely avoided, 
as it may happen when performing line fitting. 

The potential of the proposed concepts was then illus- 
trated with respect to the important problem of line fit- 
ting, with interesting and promising results that makes 
this method interesting in certain applications, such as 
when dealing with high levels of noise or avoiding out- 
liers. 

The proposed curve fitting methodology has not been 
compared to the traditional least mean square approach, 
or even the absolute value related method, because each 
of these approaches are here understood to serve specific 
demands in the light of their intrinsic properties. 

Further developments include a more systematic eval- 
uation of the method with respect to varying levels and 
types of noise, as well as the effect of the involved param- 
eters w and D. It would also be interesting to develop 
further the possibility of using the Hough transform as a 
means of finding the best fit, and compare this with the 
gradient descent approach. The problem of local peaks in 
the parameter space also deserved further investigation 
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